Hadoop: Data Processing and Modelling by Garry Turkington & Tanmay Deshpande & Sandeep Karanth

Hadoop: Data Processing and Modelling by Garry Turkington & Tanmay Deshpande & Sandeep Karanth

Author:Garry Turkington & Tanmay Deshpande & Sandeep Karanth [Turkington, Garry]
Language: eng
Format: azw3
Publisher: Packt Publishing
Published: 2016-08-31T04:00:00+00:00


We can check the loaded data, as shown here:

hive> select * from census; OK 1 Tanmay 10000 2 Sneha 12000 3 Sakalya 14000 4 Ramesh 3000 5 Rahul 4000 6 Rajesh 18000 7 Ram 3000 Time taken: 0.052 seconds, Fetched: 7 row(s)

Let's write a UDF to segregate people based on their income group. To do so, we will need to create a maven project, and add the following dependency in it:

<dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>1.2.1</version> </dependency>

Here is the Java class that provides logic for the income group function. To write a UDF, we need to extend the UDF class from hive-exec jar:

packagecom.demo.hive.udfs; importorg.apache.hadoop.hive.ql.exec.UDF; importorg.apache.hadoop.io.IntWritable; importorg.apache.hadoop.io.Text; public class IncomeClassifier extends UDF { public Text evaluate(IntWritable income) { Text incomeGroup = new Text(); if (income.get() <= 5000) { incomeGroup.set("lower"); } else if (income.get() >= 5001 &&income.get() <= 15000) { incomeGroup.set("middle"); } else if (income.get() >= 15001) { incomeGroup.set("upper"); } returnincomeGroup; } }



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.